AITopics | depth estimator

Collaborating Authors

depth estimator

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

6f3ef77ac0e3619e98159e9b6febf557-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 07:17:41 GMT

depth estimator, detection, detector, (15 more...)

Neural Information Processing Systems

Country: South America > Brazil (0.05)

Technology: Information Technology > Artificial Intelligence > Vision (0.69)

Add feedback

Dropping the D: RGB-D SLAM Without the Depth Sensor

Kiray, Mert, Karaomer, Alican, Busam, Benjamin

arXiv.org Artificial IntelligenceNov-4-2025

We present DropD-SLAM, a real-time monocular SLAM system that achieves RGB-D-level accuracy without relying on depth sensors. The system replaces active depth input with three pretrained vision modules: a monocular metric depth estimator, a learned keypoint detector, and an instance segmentation network. Dynamic objects are suppressed using dilated instance masks, while static keypoints are assigned predicted depth values and backprojected into 3D to form metrically scaled features. These are processed by an unmodified RGB-D SLAM back end for tracking and mapping. On the TUM RGB-D benchmark, DropD-SLAM attains 7.4 cm mean ATE on static sequences and 1.8 cm on dynamic sequences, matching or surpassing state-of-the-art RGB-D methods while operating at 22 FPS on a single GPU. These results suggest that modern pretrained vision models can replace active depth sensors as reliable, real-time sources of metric scale, marking a step toward simpler and more cost-effective SLAM systems.

artificial intelligence, back end, estimation, (18 more...)

arXiv.org Artificial Intelligence

2510.06216

Country: Europe > Germany (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

6f3ef77ac0e3619e98159e9b6febf557-Supplemental.pdf

Neural Information Processing SystemsAug-15-2025, 02:19:45 GMT

depth estimator, detection, detector, (15 more...)

Neural Information Processing Systems

Country: South America > Brazil (0.05)

Technology: Information Technology > Artificial Intelligence > Vision (0.33)

Add feedback

Depth-PC: A Visual Servo Framework Integrated with Cross-Modality Fusion for Sim2Real Transfer

Zhang, Haoyu, Lin, Weiyang, Jiang, Yimu, Ye, Chao

arXiv.org Artificial IntelligenceNov-26-2024

Visual servo techniques guide robotic motion using visual information to accomplish manipulation tasks, requiring high precision and robustness against noise. Traditional methods often require prior knowledge and are susceptible to external disturbances. Learning-driven alternatives, while promising, frequently struggle with the scarcity of training data and fall short in generalization. To address these challenges, we propose a novel visual servo framework Depth-PC that leverages simulation training and exploits semantic and geometric information of keypoints from images, enabling zero-shot transfer to real-world servo tasks. Our framework focuses on the servo controller which intertwines keypoint feature queries and relative depth information. Subsequently, the fused features from these two modalities are then processed by a Graph Neural Network to establish geometric and semantic correspondence between keypoints and update the robot state. Through simulation and real-world experiments, our approach demonstrates superior convergence basin and accuracy compared to state-of-the-art methods, fulfilling the requirements for robotic servo tasks while enabling zero-shot application to real-world scenarios. In addition to the enhancements achieved with our proposed framework, we have also substantiated the efficacy of cross-modality feature fusion within the realm of servo tasks.

depth information, information, servo task, (14 more...)

arXiv.org Artificial Intelligence

2411.17195

Country: Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

High-Resolution Flood Probability Mapping Using Generative Machine Learning with Large-Scale Synthetic Precipitation and Inundation Data

Huang, Lipai, Antolini, Federico, Mostafavi, Ali, Blessing, Russell, Garcia, Matthew, Brody, Samuel D.

arXiv.org Artificial IntelligenceSep-20-2024

High-resolution flood probability maps are essential for addressing the limitations of existing flood risk assessment approaches but are often limited by the availability of historical event data. Also, producing simulated data needed for creating probabilistic flood maps using physics-based models involves significant computation and time effort inhibiting the feasibility. To address this gap, this study introduces Flood-Precip GAN (Flood-Precipitation Generative Adversarial Network), a novel methodology that leverages generative machine learning to simulate large-scale synthetic inundation data to produce probabilistic flood maps. With a focus on Harris County, Texas, Flood-Precip GAN begins with training a cell-wise depth estimator using a limited number of physics-based model-generated precipitation-flood events. This model, which emphasizes precipitation-based features, outperforms universal models. Subsequently, a Generative Adversarial Network (GAN) with constraints is employed to conditionally generate synthetic precipitation records. Strategic thresholds are established to filter these records, ensuring close alignment with true precipitation patterns. For each cell, synthetic events are smoothed using a K-nearest neighbors algorithm and processed through the depth estimator to derive synthetic depth distributions. By iterating this procedure and after generating 10,000 synthetic precipitation-flood events, we construct flood probability maps in various formats, considering different inundation depths. Validation through similarity and correlation metrics confirms the fidelity of the synthetic depth distributions relative to true data. Flood-Precip GAN provides a scalable solution for generating synthetic flood depth data needed to create high-resolution flood probability maps, significantly enhancing flood preparedness and mitigation efforts.

precipitation, precipitation-flood event, rainfall event, (13 more...)

arXiv.org Artificial Intelligence

2409.13936

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
North America > United States > Virginia > Norfolk City County > Norfolk (0.04)
North America > United States > Texas > Harris County > Houston (0.04)
(4 more...)

Genre: Research Report > New Finding (0.93)

Industry: Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.88)

Add feedback

EvGGS: A Collaborative Learning Framework for Event-based Generalizable Gaussian Splatting

Wang, Jiaxu, He, Junhao, Zhang, Ziyi, Sun, Mingyuan, Sun, Jingkai, Xu, Renjing

arXiv.org Artificial IntelligenceJun-3-2024

Event cameras offer promising advantages such as high dynamic range and low latency, making them well-suited for challenging lighting conditions and fast-moving scenarios. However, reconstructing 3D scenes from raw event streams is difficult because event data is sparse and does not carry absolute color information. To release its potential in 3D reconstruction, we propose the first event-based generalizable 3D reconstruction framework, called EvGGS, which reconstructs scenes as 3D Gaussians from only event input in a feedforward manner and can generalize to unseen cases without any retraining. This framework includes a depth estimation module, an intensity reconstruction module, and a Gaussian regression module. These submodules connect in a cascading manner, and we collaboratively train them with a designed joint loss to make them mutually promote. To facilitate related studies, we build a novel event-based 3D dataset with various material objects and calibrated labels of grayscale images, depth maps, camera poses, and silhouettes. Experiments show models that have jointly trained significantly outperform those trained individually. Our approach performs better than all baselines in reconstruction quality, and depth/intensity predictions with satisfactory rendering speed.

artificial intelligence, machine learning, reconstruction, (15 more...)

arXiv.org Artificial Intelligence

2405.14959

Country:

Europe > Austria > Vienna (0.14)
Asia > China > Guangdong Province > Guangzhou (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine > Health Care Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Self-Supervised Geometry-Guided Initialization for Robust Monocular Visual Odometry

Kanai, Takayuki, Vasiljevic, Igor, Guizilini, Vitor, Shintani, Kazuhiro

arXiv.org Artificial IntelligenceJun-2-2024

Monocular visual odometry is a key technology in a wide variety of autonomous systems. Relative to traditional feature-based methods, that suffer from failures due to poor lighting, insufficient texture, large motions, etc., recent learning-based SLAM methods exploit iterative dense bundle adjustment to address such failure cases and achieve robust accurate localization in a wide variety of real environments, without depending on domain-specific training data. However, despite its potential, learning-based SLAM still struggles with scenarios involving large motion and object dynamics. In this paper, we diagnose key weaknesses in a popular learning-based SLAM model (DROID-SLAM) by analyzing major failure cases on outdoor benchmarks and exposing various shortcomings of its optimization process. We then propose the use of self-supervised priors leveraging a frozen large-scale pre-trained monocular depth estimation to initialize the dense bundle adjustment process, leading to robust visual odometry without the need to fine-tune the SLAM backbone. Despite its simplicity, our proposed method demonstrates significant improvements on KITTI odometry, as well as the challenging DDAD benchmark. Code and pre-trained models will be released upon publication.

bundle adjustment, depth estimation, estimation, (13 more...)

arXiv.org Artificial Intelligence

2406.00929

Country:

North America > United States > California > Santa Clara County > Los Altos (0.04)
Asia > Japan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Introspective Perception for Mobile Robots

Rabiee, Sadegh, Biswas, Joydeep

arXiv.org Artificial IntelligenceJun-29-2023

Perception algorithms that provide estimates of their uncertainty are crucial to the development of autonomous robots that can operate in challenging and uncontrolled environments. Such perception algorithms provide the means for having risk-aware robots that reason about the probability of successfully completing a task when planning. There exist perception algorithms that come with models of their uncertainty; however, these models are often developed with assumptions, such as perfect data associations, that do not hold in the real world. Hence the resultant estimated uncertainty is a weak lower bound. To tackle this problem we present introspective perception - a novel approach for predicting accurate estimates of the uncertainty of perception algorithms deployed on mobile robots. By exploiting sensing redundancy and consistency constraints naturally present in the data collected by a mobile robot, introspective perception learns an empirical model of the error distribution of perception algorithms in the deployment environment and in an autonomously supervised manner. In this paper, we present the general theory of introspective perception and demonstrate successful implementations for two different perception tasks. We provide empirical results on challenging real-robot data for introspective stereo depth estimation and introspective visual simultaneous localization and mapping and show that they learn to predict their uncertainty with high accuracy and leverage this information to significantly reduce state estimation errors for an autonomous mobile robot.

artificial intelligence, machine learning, perception algorithm, (21 more...)

arXiv.org Artificial Intelligence

2306.16698

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Greece > Ionian Islands > Corfu (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(5 more...)

Add feedback

Monocular Visual-Inertial Depth Estimation

Wofk, Diana, Ranftl, René, Müller, Matthias, Koltun, Vladlen

arXiv.org Artificial IntelligenceMar-21-2023

Abstract--We present a visual-inertial depth estimation pipeline that integrates monocular depth estimation and visualinertial odometry to produce dense depth estimates with metric scale. Here, with GA+SML, objects are aligned more accurately, the center desk leg is straightened, and the top of the desk is pulled forward. Works that use inertial data to inform metric scale typically Depth perception is fundamental to visual navigation, where perform depth completion given a set of known sparse metric correctly estimating distances can help plan motion and avoid depth points and tend to be self-supervised in nature due to a obstacles. Accurate depth estimation can also aid scene reconstruction, lack of visual-inertial datasets [6], [7]. We seek to bridge these mapping, and object manipulation. Some applications approaches by leveraging monocular depth estimation models of estimated depth benefit when it is metrically trained on diverse datasets and recovering metric scale for accurate--when every depth value is provided in absolute individual depth estimates. Our approach performs least-squares fitting of monocular Algorithms for dense depth estimation can be broadly depth estimates against sparse metric depth, followed by grouped into several categories. Stereo-based approaches rely learned local per-pixel adjustment. Structurefrom-motion and dense (local) depth alignment successfully rectifies metric (SfM) tries to estimate scene geometry from scale, with dense alignment consistently outperforming a a sequence of images taken by a moving camera, but it is purely global alignment baseline.

artificial intelligence, depth estimation, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2303.12134

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas > Upstream (0.35)

Technology: Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)

Add feedback

Robust Monocular Localization of Drones by Adapting Domain Maps to Depth Prediction Inaccuracies

Shukla, Priyesh, S., Sureshkumar, Stutts, Alex C., Ravi, Sathya, Tulabandhula, Theja, Trivedi, Amit R.

arXiv.org Artificial IntelligenceOct-27-2022

We present a novel monocular localization framework by jointly training deep learning-based depth prediction and Bayesian filtering-based pose reasoning. The proposed cross-modal framework significantly outperforms deep learning-only predictions with respect to model scalability and tolerance to environmental variations. Specifically, we show little-to-no degradation of pose accuracy even with extremely poor depth estimates from a lightweight depth predictor. Our framework also maintains high pose accuracy in extreme lighting variations compared to standard deep learning, even without explicit domain adaptation. By openly representing the map and intermediate feature maps (such as depth estimates), our framework also allows for faster updates and reusing intermediate predictions for other tasks, such as obstacle avoidance, resulting in much higher resource efficiency.

artificial intelligence, machine learning, predictor, (18 more...)

arXiv.org Artificial Intelligence

2210.15559

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback